Search CORE

16 research outputs found

Combining Neural Language Models for WordSense Induction

Author: Aleksashina Tatiana
Arefyev Nikolay
Sheludko Boris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/06/2020
Field of study

Word sense induction (WSI) is the problem of grouping occurrences of an ambiguous word according to the expressed sense of this word. Recently a new approach to this task was proposed, which generates possible substitutes for the ambiguous word in a particular context using neural language models, and then clusters sparse bag-of-words vectors built from these substitutes. In this work, we apply this approach to the Russian language and improve it in two ways. First, we propose methods of combining left and right contexts, resulting in better substitutes generated. Second, instead of fixed number of clusters for all ambiguous words we propose a technique for selecting individual number of clusters for each word. Our approach established new state-of-the-art level, improving current best results of WSI for the Russian language on two RUSSE 2018 datasets by a large margin.Comment: International Conference on Analysis of Images, Social Networks and Texts AIST 2019: Analysis of Images, Social Networks and Texts, pp 105-12

arXiv.org e-Print Archive

Crossref

HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings

Author: Anwar Saba
Arefyev Nikolay
Biemann Chris
Panchenko Alexander
Ponzetto Simone Paolo
Ustalov Dmitry
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

We present our system for semantic frame induction that showed the best performance in Subtask B.1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et al., 2019). Our approach separates this task into two independent steps: verb clustering using word and their context embeddings and role labeling by combining these embeddings with syntactical features. A simple combination of these steps shows very competitive results and can be extended to process other datasets and languages.Comment: 5 pages, 3 tables, accepted at SemEval 201

arXiv.org e-Print Archive

Crossref

Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution

Author: Arefyev Nikolay
Panchenko Alexander
Podolskiy Alexander
Sheludko Boris
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 07/06/2022
Field of study

Lexical substitution, i.e. generation of plausible words that can replace a particular target word in a given context, is an extremely powerful technology that can be used as a backbone of various NLP applications, including word sense induction and disambiguation, lexical relation extraction, data augmentation, etc. In this paper, we present a large-scale comparative study of lexical substitution methods employing both rather old and most recent language and masked language models (LMs and MLMs), such as context2vec, ELMo, BERT, RoBERTa, XLNet. We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly. Several existing and new target word injection methods are compared for each LM/MLM using both intrinsic evaluation on lexical substitution datasets and extrinsic evaluation on word sense induction (WSI) datasets. On two WSI datasets we obtain new SOTA results. Besides, we analyze the types of semantic relations between target words and their substitutes generated by different models or given by annotators.Comment: arXiv admin note: text overlap with arXiv:2006.0003

arXiv.org e-Print Archive

Negative sampling improves hypernymy extraction based on projection learning

Author: Arefyev Nikolay
Biemann Chris
Panchenko Alexander
Ustalov Dmitry
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We present a new approach to extraction of hypernyms based on projection learning and word embeddings. In contrast to classification-based approaches, projection-based methods require no candidate hyponym-hypernym pairs. While it is natural to use both positive and negative training examples in supervised relation extraction, the impact of positive examples on hypernym prediction was not studied so far. In this paper, we show that explicit negative examples used for regularization of the model significantly improve performance compared to the state-of-the-art approach of Fu et al. (2014) on three datasets from different languages

arXiv.org e-Print Archive

Crossref

ZENODO

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

MAnnheim DOCument Server

FigShare

Construction of fairways and reconstruction of channels using rotary-bucket dredgers and calculation of soil-collecting devices

Author: Arefyev Nikolay
Marfina Olga
Matveev Yurii
Popov Nikolay
Volodin Yurii
Publication venue: 'EDP Sciences'
Publication date: 01/01/2021
Field of study

Rotary bucket dredgers are used in various operations: dredging, mining, development of all types of soil. Despite their high weight, cost and complexity of construction, they are increasingly used in underwater soil development due to their versatility and high efficiency. The article presents the developed method for calculating the rotary bucket dredgers, taking into account their placement under water

Directory of Open Access Journals

RUSSE'2018 : a shared task on word sense induction for the Russian language

Author: Arefyev Nikolay
Leontyev Alexey
Lopukhin Konstantin
Lopukhina Anastasiya
Loukachevitch Natalia
Panchenko Alexander
Ustalov Dmitry
Publication venue: RSUH
Publication date: 01/01/2018
Field of study

The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic languages, such as rich morphology and free word order. The participants were asked to group contexts with a given word in accordance with its senses that were not provided beforehand. For instance, given a word “bank” and a set of contexts with this word, e.g. “bank is a financial institution that accepts deposits” and “river bank is a slope beside a body of water”, a participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the “company” and the “area” senses of the word “bank”. For the purpose of this evaluation campaign, we developed three new evaluation datasets based on sense inventories that have different sense granularity. The contexts in these datasets were sampled from texts of Wikipedia, the academic corpus of Russian, and an explanatory dictionary of Russian. Overall 18 teams participated in the competition submitting 383 models. Multiple teams managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings

arXiv.org e-Print Archive

MAnnheim DOCument Server

Current approaches to the management of patients with cirrhotic ascites

Author: Dmitry Victorovich Garbuzenko
Nikolay Olegovich Arefyev
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date
Field of study

Crossref